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evaluation process uses the kappa statistic, the Fl-score, recall, accuracy, and precision. 
The system has a very high detection and efficiency rate, with a detection rate of over 99%. 


A total of 9594 cases and 44 distinct columns are included in the dataset. The study's 
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implementing a supervised learning algorithm for detecting attacks in cloud computing 
environments. The main objective is distinguishing between "Normal" and "Attack" 
statuses based on a carefully curated dataset. Several metrics, such as the kappa statistic, 
Fl-score, recall, accuracy, and precision, are employed to evaluate the model's 
performance. The dataset utilized in this research comprises 9594 cases and encompasses 
44 distinct columns, each representing specific features relevant to cloud computing 
security. Through a rigorous evaluation process, the algorithm demonstrates exceptional 
efficiency, achieving a remarkable detection rate of over 99%. Such high accuracy in 
identifying attacks is crucial for ensuring the integrity and security of cloud-based systems. 
The significance of this study lies in its successful application of a supervised learning 
approach to tackle cloud computing security challenges effectively. The model's high 
detection rate and efficiency indicate its potential for real-world deployment in cloud-based 
systems, contributing to enhanced threat detection and mitigation. These results hold 
promising implications for bolstering the security measures of cloud computing platforms 
and safeguarding sensitive data and services from potential attacks. 


Introduction 

Traditional Cloud computing is the on-demand 
provisioning of computer resources over the Internet, 
including servers, storage, databases, networking, 
software, etc., over the Internet (Butt et al., 2023). It can 
be easily scaled up or down based on demand, allowing 
for flexibility and cost optimization. The responsibility 
for infrastructure maintenance, security patches, and 
updates lies with the cloud provider. Amazon Web 
Service, Microsoft Azure, Google Cloud Platform, IBM 
Cloud, and Oracle Cloud are popular cloud service 
providers. There are three types of cloud deployment 
models. The public cloud is available for public use. 
Private Cloud, where Infrastructure and services are 


dedicated to a single organization and can be hosted 
internally or externally. Hybrid Cloud combines public 
and private clouds, allowing data and applications to be 
shared between them (Jain and Rajak, 2023). 

Figure 1 explains the collaborative efforts of 
cloudlets, brokers, data centers, and services to provide 
efficient and accessible cloud services. The process 
begins when a user initiates a request for a specific cloud 
service or application. The request is forwarded if a 
cloudlet is nearby, enabling faster processing and reduced 
latency. Based on the evaluation, the broker selects the 
most suitable cloud service provider and communicates 
the user's request and service selection to the chosen data 
center. The data center receives the request, allocates the 
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the 
requested service or application. This collaborative 
working of cloudlets, brokers, data centers, and services 


necessary computing resources, and executes 


optimizes the delivery of cloud services, reduces latency, 
enhances user experience, and provides users with 
efficient access to a wide range of computing resources 
and applications. Some of the security attacks are as 
follows. 


p seice | 


Submit 
Cloudlet 


Submit Job 


Data Centre 


———— Return 
Figure 1. Cloud Computing Architecture 


Denial of Service (DoS) Attacks 

These attacks can disrupt the normal functioning of 
cloud services, leading to downtime, loss of access, and 
potential financial and reputational damages for both the 
cloud service provider and its customers (George et al., 
2023; Gemmer et al., 2023; Clemens et al., 2023; Ashlam 
et al., 2023; Anita et al., 2023). Side Channel Attacks: 
These attacks take advantage of the physical 
characteristics, timing, power consumption, etc., of 
systems to extract sensitive information (Yan et al., 2015; 
Sahi et al., 2017; Agarwal et al., 2019; Gopinath et al., 
2023). Man-In-The-Middle Cryptographic Attacks, the 
attacker captures or alters the transmitted data, including 
sensitive user credentials, financial information, or 
critical application data (Lu et al., 2021; Ha et al., 2022; 
Zhang et al., 2019; Sultan et al. 2022). Strong 
Authentication and Access Controls, Data Encryption, 
Regular Security Updates, and Patching Network 
Security (Ma et al., 2023; Gong et al., 2020) measures 
like Implementing firewalls, intrusion detection & 
prevention systems (Radhakishan et al., 2011; Ren et al., 
2020a,b). Moreover, network segmentation protects 
cloud networks from unauthorized access and malicious 
activities. 

Machine Learning (ML) is a subset of artificial 
intelligence (Joshi et al., 2023). that involves the 
development of algorithms and models that enable 
computers to learn from data and make predictions. 
Algorithms that can automatically learn and improve 
from experience instead of relying on instructions. Image 
and speech recognition, natural language processing 
(NLP) (Dash et al., 2023; Khurana et al., 2023), anomaly 
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detection, predictive analytics, etc., done using ML 
(Kreuzberger et al., 2023; Kwekha-Rashid et al., 2023). 
Supervised learning is a machine learning approach that 


involves training models on labeled data, where each data 
point is associated with a known outcome. Supervised 
learning aims to learn a mapping function that can 
accurately predict or classify new, unseen instances based 
on their features (Yu et al., 2023; Wu et al., 2023). 
Regression and Classification are two types of supervised 
learning. Classifiers like simple vector machine (SVM) 
(Kurani et al., 2023), Random Forest (Bicego et al., 
2023), Decision Tree (Utukuru et al., 2023), Logistic 
regression (Wang et al., 2023), Naive Bayes (Saleh et al., 
2023), Xtreme Gradient Boosting (XGBoost) (Iban et al., 
2023), K-Nearest Neighbour (K-NN) (Mohy-Eddie et al., 
2023) etc. are an example of supervised learning. 
(Aldhyani et al., 2022) 
cloud computing using ML and DL methods using SVM, 
KNN, RF, and Deep learning methods.(Arunkumar et al., 
2023) Proposed that Gannet Optimization Algorithm- 
based hybrid SVM-ELM mitigates attacks in the cloud. 
Matlab software is used for simulation using 
CICIDS2017 datasets and Proposed a DL fusion-based 
method to solve DDoS issues in cloud computing. More 


suggested EDoS attacks in 


optimized DL methods can be deployed for better cloud 
detection. (Patel et al., 2022) reviewed the DL method for 
attack detection in the network. (Verma et al., 2023) 
proposed the RepuTE method for DDoS attack detection 
in IoT and fog computing. Classifiers will be designed to 
counter future live IoT network traffic attacks; proposed 
a Text-Mining approach for accident analysis in steel 
plants to reduce human involvement in accident-prone 
areas. Text mining is done in two phases. Four years of 
data are used for model building and its analysis. 
However, there is scope for future enhancements using 
different ML classifiers. The author uses SVM, ANN, 
NB, K-NN, and RF Classifiers for predicting and 
analyzing injury severity (Sarkar et al., 2020). In future 
studies, accidents are also predicted using the time series 
method. Some of the other relevant papers (Sarkar et al., 
2019; Pramanik et al., 2021; Das et al., 2022; Paramanik 
et al., 2022; Bag et al., 2023; Dey et al., 2023). 
This research aims to develop and 
supervised machine learning algorithms 


implement 
to detect 
intrusion attempts in a cloud network. The objective is to 
accurately classify instances as either 'Normal' or 'Attack' 
to enhance the security of cloud computing. This work 
aims to design and implement an anomaly detection- 
based network intrusion detection system for a cloud 
computing network that can detect as many potential 
security issues as feasible. 


Table 1. Pseudocode for Logistic Regression 
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function Logistic_Regression_Train(dataset) 
weights = initialize_weights () 

bias = initialize bias () 

for iteration = 1 to num_iterations do: 


weights -= learning_rate * gradients 
return weights, bias 


predictions = [] 
For instance, In the Dataset 


predictions. Append(prediction) 
return predictions 

function sigmoid(x): 

return 1 / (1 + exp(-x)) 


gradients = compute_gradients (dataset, labels, weights, bias) 
bias -= learning_rate * compute_bias_gradient(gradients) 


function Logistic_Regression_Predict (dataset, weights, bias): 


prediction = sigmoid (dot product (weights, instance) + bias) 


Research Questions 

Ql. What precautions must a user consider before 
going for cloud computing? 

Q2. How to secure the data while transferring to the 
Cloud? 

Q3. How to make sure data stored in the Cloud is 
secured? 


Contribution 

This study focuses on the hidden security attacks of 
CSPs that affect the quality of services resulting in much 
wastage of cloud resources and client money as the cloud 
work on a “Pay per basics model.” Detecting cloud 
attacks is very difficult as it involves a large set of traffic 
in real-time. Our smart model can help separate both 
normal and abnormal packets (attacks) using different 
classifiers of ML from the network, which is this paper's 
main contribution. In this experiment, we use an actual 
dataset from the cloud server. The result of this testing is 
equated with different standing systems to prove this 
system’s durability and efficiency. 


Materials and Method 

Table 1 includes initializing the weights and bias, 
iterating over the number of iterations, computing 
gradients, updating the weights and bias using the 
learning rate, and using the sigmoid function for 
prediction. The “dot product’ function calculates the dot 
product of the weights and an instance of the dataset. 

Figure 2 shows the working of the proposed model. 
The first dataset is processed. The dataset is simulated on 
MATLAB 2023(a) using supervised Machine learning 
classifiers like LG, SVM, DT, RF, XG-Boost, etc. 
Parameters such as accuracy, precision, Fl score, and 
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Recall values were derived. The system with 16 GB 
RAM, 1 TB ROM, and 


Pre-Processed Data 


SVMLLG,DT.RF, 
xXxG-Boost 


Accuracy,Precision,F lL 
Score, Recall Value, 
AUC Curve. 


Figure 2. Proposed methodology flowchart 


MATLAB version R2023 (a) is used to perform these 
experiments. A private cloud was used as the setting for 
the creation of the dataset. The private cloud 
infrastructure was set up with the help of a KVM type-1 
hypervisor and an Open Nebula (5.12 version) cloud 
management platform. On cloud-based virtual machines, 
a script was run to generate a synthetic workload 
replicating the actual cloud model in real-time. We split 
our dataset in the ratio of 70: 30 during data pre- 
processing. The dataset is then prepared for training and 
testing by removing duplicates and outliers. We removed 
some extra features from the dataset to reduce the time 


Table 2. Feature selection for attack status 
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LAST_POLL _ rxbytes_slope rxpackets_slope txpackets_slope timesys_slope | Status 
1604624102 87.8402 27.2996 89.9974 17.8787 Attack 
1604624071 87.9098 28.0725 89.9974 18.4349 Attack 
1604624041 88.3794 30.3791 89.9974 34.5923 Attack 
1604624012 88.0519 29.5388 89.9974 18.4349 Attack 
1604623982 87.9098 28.0725 89.9974 18.4349 Attack 
1604623952 87.9098 28.0725 89.9969 18.4349 Attack 
1604623922 88.0114 29.5388 89.9973 33.6901 Attack 
1604623892 88.0519 29.5388 89.9974 18.4349 Attack 
1604623862 88.9491 37.7468 89.9971 17.8787 Attack 
1604623831 87.9098 28.0725 89.9959 18.4349 Attack 
1604623772 87.9546 28.0725 89.9974 18.4349 Attack 
1604623742 87.9098 28.0725 89.9970 33.6901 Attack 
1604623712 88.0114 29.5388 89.9968 18.4349 Attack 


required to process the data. The irrelevant features and 
unused variables, such as the Time Stamp, Virtual 
Machine Identity, Unique Domain Identifier, and Domain 
Name, were removed. A distinct dataset, including 
characteristics, was developed following pre-processing. 
The subsequent tests were done on this dataset. In total, 
the dataset contains 9594 cases and 44 different columns. 
The first four columns of the table are used to hold 
metadata, which includes the epoch time, the virtual 
machine ID, the domain name, and the domain identifier. 
Two columns provide specific information about the 
available network, RAM, and disk space. In this table’s 
last column, record whether the target is currently being 
attacked or functioning normally. Datasets are trained 
using a classifier like SVM, RF, LR, DT, K-NN, XG- 
Boost, and NB. Accuracy, Precision, Recall, Fl-score, 
and Kappa statistics are all evaluated for each model. 


Result and Discussion 

Table 2 and Table 3 show the selected features, such 
as the last poll, rxbytes_slope, and txpackets_slope, with 
their values representing Attack and Normal status, 
respectively. 
LAST_POLL 

This column represents the date of the last poll, and 
rxbytes_slope represents the slope or rate of change of 
received bytes over time. rxpackets_slope represents the 
rate of change of received packets over time. 
txpackets_slope: Column represents the rate of change 
of transmitted packets over time. 
timesys_slope: This column represents the rate of change 
of system time over time. Status represents the device 
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being monitored. The values in each column would 
reflect the respective characteristics at that time. 

Table 3 shows the categories of Normal from the 
dataset. In this research, the accuracy of the four different 
ensemble classifiers is analyzed and evaluated using the 
area under the curve as the metric of choice performance 
and 


under comparison between imbalanced data 


oversampled data. 


m@ Attack 


Normal 


Figure 3. Distributed Targeted Variables 


Fig 3 explains data Categorization into normal or 
under attack in pie chart form. Attack equals 2306 while 
Normal equals 7288. 

Figure 4 shows a strong correlation between the status 
and the pace at which packets are sent over the network. 
It illustrates the link between features. The present state 
is significantly related to the amount of data transmitted 
over the network. The status positively correlates with the 
following metrics: the number of network packets 
received per second, the number of network bytes 
received per second, the number of network bytes 
transferred per second, the usage of kernel space, and 
user space. 


Table 3. Feature selection for Normal status 
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LAST_POLL rxbytes_slope rxpackets_slope txpackets_slope timesys_slope Status 
1604455173 88.20650 30.14140 24.3045 89.9850 Normal 
1604455142 87.87080 27.34990 15.9061 89.8986 Normal 
1604455113 87.88650 27.29960 32.8285 89.9897 Normal 
1604455082 87.87600 27.40760 14.2360 89.8741 Normal 
1604455055 87.72410 25.82100 22.7510 89.9864 Normal 
1604455024 87.71280 25.71000 18.4349 89.9685 Normal 
1604454997 88.10170 29.74490 23.1986 89.9864 Normal 
1604454962 87.87600 27.40760 12.5288 89.9580 Normal 
1604454935 87.72860 25.86640 16.8584 89.9829 Normal 
1604454902 87.79740 26.56510 21.8014 89.9818 Normal 
1604454580 87.79740 26.56510 32.2756 89.9887 Normal 
1604454542 87.87080 27.34990 17.8533 89.9324 Normal 
1604454513 87.79740 26.56510 33.6901 89.9907 Normal 

Figure 5 explains the statistical technique of achieved an accuracy of 99.04%, better than many other 


classifying objects, data points, or clusters based on their 
similarities or dissimilarities. Cluster characteristics and 


differences between clusters can be analyzed to 


proposed models. Our model compares with (Aldhyani et 
al., 2022; Fazlullah et al., 2023; Sagarkumar, 2023; GSR 
et al., 2023), which have an accuracy of 86.23% ,96.53%, 


Feature Correlation Heatmap 
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understand patterns or groupings in the dataset. They can 
be interpreted to gain insights into the structure of the 
data. It helps discover hidden structures, identify similar 
groups, and facilitate further analysis or decision-making 
based on the identified clusters. 

Figure 6 shows Precision, Recall, Fl Score, Accuracy, 
and Kappa Statistics for Logistic regression, SVM, Nave 
Bayes KNN, Grid Search Decision Trees, Random 
Forest, and XG Boost. 

Table 4 shows the assessment of the accuracy 


percentage of our work with other proposed models. We 
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Figure 7 below. Table 5 shows the assessment of the 
precision percentage of our work with other proposed 
models. We achieved a precision of 95.06%, better than 
many other proposed models (Fazlullah et al., 2023; Emil 
et al., 2023), having 95.06% and 86.48%, respectively. 
Figure 8 shows the graphical representation of our model 
and other 


percentage. 


existing methods regarding precision 
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Table 4. Performance Evaluation on Accuracy 
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Parameter Table 5. Model comparison in terms of Precision 
Reference Accuracy (%) parameter 
Our eee ane Reference | Precision (%) 
flahyen et al., 2022 86.23 Our Model 95.06 
Aldhyani et al., 2022 89.84 
Aldhyani et al., 2022 97.54 eee ee 
Aldhyani et al., 2022 97.50 CE le 28 
Khan et al., 2023 06.53 Khan et al., 2023 93.83 
Khan et al., 2023 94.05 Khan et al., 2023 94.74 
Khan et al., 2023 91.41 Khan et al., 2023 92.33 
Khan et al., 2023 86.72 Khan et al., 2023 91.99 
Khan et al., 2023 94.32 GSR et al., 2023 86.48 
Khan et al., 2023 95.46 GSR et al., 2023 83.66 
Khan et al., 2023 97.69 
Khan et al., 2023 98.56 so cat Ai 
Khan et al., 2023 907.37 We present a way of detecting cloud attacks using a 
Rhan et al;, 2003 96.33 Se aa learning reonmique and cael Our Se 
GSR et al. 2023 92.00 gives 99.04 % accuracy, so in many practical SOP nanOs: it 
can be used as discussed below: As cloud computing has 
GSR et al., 2023 89.89 
emerged as new technological advancement and most 
fe 85.00 businesses are deploying cloud services to boost their 
Patel, 2023 90.00 business, the cloud is becoming increasingly vulnerable 
Patel, 2023 89.00 to cryptographic attacks. These attacks can affect the 
Patel, 2023 91.00 smooth working of a business and can even lead to 
Patel, 2023 92.00 stilling relevant organizational information. Our model 
Patel, 2023 93.00 presents a supervised learning technique to detect cloud 
Patel, 2023 95.00 attacks with an accuracy of 99.04% and a precision of 


95.06%. Classifiers like Logistic regression, simple 
vector machine (SVM), Random Forest, Decision Tree, 
Naive Bayes, Xtreme Gradient Boosting (XGBoost), K- 
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Figure 6. Comparison of Precision among various existing methods 
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Nearest Neighbour (K-NN), etc. are used in our 
experimental work. The model can prevent a cloud attack 
if deployed in the actual scenario. In the future, this 
model can be used to detect specific cloud attacks like 


Cross Site Scripting (XSS) and SQL Injection attacks. 
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